Digital media authentication

ABSTRACT

A method, system and product including obtaining a media stream depicting a real-time communication of a participant in a communication context; identifying the communication context; obtaining a personalized model of the participant when communicating in the communication context, wherein the personalized model is configured to identify a behavioral pattern of the participant; executing the personalized model on at least a portion of the media stream to determine whether a behavioral pattern of the participant in the media stream matches the behavioral pattern of the participant according to the personalized model; and upon identifying a mismatch between the behavioral pattern of the participant in the media stream and the behavioral pattern of the participant according to the personalized model, performing a responsive action.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.17/070,198, filed Oct. 14, 2020, entitled Digital Media Authentication,and claims the benefit of provisional patent application No. 62/927,292,entitled “Personalized Media Validation” filed Oct. 29, 2019, which ishereby incorporated by reference in its entirety without giving rise todisavowment.

TECHNICAL FIELD

The present disclosure relates to authenticating digital media ingeneral, and to authenticating communications associated with a personusing a personalized model of the person, in particular.

BACKGROUND

Modern techniques enable to create fake videos and audio to lookconvincing and authentic, using one or more techniques. Such media canbe used to generate fake news, to promote disinformation, or the like.

BRIEF SUMMARY

One exemplary embodiment of the disclosed subject matter is a methodcomprising: obtaining a media stream associated with a participant,wherein the media stream depicting a real-time communication of theparticipant in a communication context; identifying the communicationcontext; obtaining a personalized model of the participant whencommunicating in the communication context, wherein the personalizedmodel is configured to identify a behavioral pattern of the participant;executing the personalized model on at least a portion of the mediastream to determine whether a behavioral pattern of the participant inthe media stream matches the behavioral pattern of the participantaccording to the personalized model; and upon identifying a mismatchbetween the behavioral pattern of the participant in the media streamand the behavioral pattern of the participant according to thepersonalized model, performing a responsive action.

Optionally, the responsive action comprises generating an alert orblocking the real-time communication, wherein the alert indicates thatthe media stream is forged.

Optionally, identifying the mismatch comprises determining that adifference between the behavioral pattern of the participant in themedia stream and the behavioral pattern of the participant according tothe personalized model exceeds a threshold.

Optionally, the personalized model comprises a classifier that istrained on a dataset, wherein the dataset comprises media recordsdepicting communications of the participant in the communicationcontext.

Optionally, the dataset comprises a first class of media and a secondclass of media, wherein the first class comprises media recordsoriginally depicting the participant in a communication context, whereinthe second class comprises media records originally depicting otherpeople excluding the participant in the communication context, themethod comprising training the personalized model to classify media asbelonging to the first class or to the second class.

Optionally, media fabrication techniques may be implemented on the firstclass, thereby obtaining processed records of the participant, whereinsaid media fabrication techniques are configured to replace theparticipant with different people excluding the participant, mediafabrication techniques may be implemented on the second class, therebyobtaining processed records of the other people, wherein said mediafabrication techniques are configured to replace the other people, theprocessed records of the participant may be added to the first class,and the processed records of the other people may be added to the secondclass.

Optionally, implementing the media fabrication techniques on the secondclass may comprise superimposing the participant over at least some ofthe other people.

Optionally, a first personalized model of the participant may be trainedunder a first communication context, and a second personalized model ofthe participant may be trained under a second communication context.

Optionally, the communication context may be a friendship relationship,a co-working relationship, a family relationship, a businessrelationship, a customer-client relationship, a romantic relationship,or the like.

Optionally, the communication context comprises a topic of the real-timecommunication, or the like.

Optionally, an identity of the participant is determined based on atleast one of: a facial recognition method implemented on the mediastream, an audio recognition method implemented on the media stream,metadata of the media stream, and tags relating to the participant thatare attached to the media stream, wherein the communication contextcomprises the identity of the participant.

Optionally, identifying the communication context comprises determininga second participant in the real-time communication, wherein the mediastream depicts the real-time communication between the participant andthe second participant; wherein the communication context is a contextof the participant communicating with the second participant; and saidobtaining the personalized model comprises obtaining a private modelgenerated based on past communications between the participant and thesecond participant, wherein the past communications are not publiclyaccessible.

Optionally, the behavioral pattern of the participant comprises facemovements of the participant, face gestures of the participant, a gaitof the participant, a walking pattern of the participant, hand movementsof the participant, frequently used phrases of the participant, atalking manner of the participant, a voice pattern of the participant,or the like.

Optionally, the method is implemented on a communication system used bya second participant, wherein the communication context is acommunication between the participant and the second participant,wherein the communication system is configured to retain communicationsbetween the participant and the second participant and to generate aprivate model for the communication context based on the retainedcommunications.

Another exemplary embodiment of the disclosed subject matter is acomputer program product comprising a non-transitory computer readablestorage medium retaining program instructions, which programinstructions when read by a processor, cause the processor to: obtain amedia stream associated with a participant, wherein the media streamdepicts a real-time communication of the participant in a communicationcontext; identify the communication context; obtain a personalized modelof the participant when communicating in the communication context,wherein the personalized model is configured to identify a behavioralpattern of the participant; execute the personalized model on at least aportion of the media stream to determine whether a behavioral pattern ofthe participant in the media stream matches the behavioral pattern ofthe participant according to the personalized model; and uponidentifying a mismatch between the behavioral pattern of the participantin the media stream and the behavioral pattern of the participantaccording to the personalized model, perform a responsive action.

Yet another exemplary embodiment of the disclosed subject matter is asystem, the system comprising a processor and coupled memory, theprocessor being adapted to: obtain a media stream associated with aparticipant, wherein the media stream depicts a real-time communicationof the participant in a communication context; identify thecommunication context; obtain a personalized model of the participantwhen communicating in the communication context, wherein thepersonalized model is configured to identify a behavioral pattern of theparticipant; execute the personalized model on at least a portion of themedia stream to determine whether a behavioral pattern of theparticipant in the media stream matches the behavioral pattern of theparticipant according to the personalized model; and upon identifying amismatch between the behavioral pattern of the participant in the mediastream and the behavioral pattern of the participant according to thepersonalized model, perform a responsive action.

THE BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

The present disclosed subject matter will be understood and appreciatedmore fully from the following detailed description taken in conjunctionwith the drawings in which corresponding or like numerals or charactersindicate corresponding or like components. Unless indicated otherwise,the drawings provide exemplary embodiments or aspects of the disclosureand do not limit the scope of the disclosure. In the drawings:

FIG. 1 shows a schematic illustration of an exemplary environment inwhich the disclosed subject matter may be utilized, in accordance withsome exemplary embodiments of the disclosed subject matter;

FIG. 2 shows a flowchart diagram of a method, in accordance with someexemplary embodiments of the disclosed subject matter;

FIG. 3 shows a flowchart diagram of a method, in accordance with someexemplary embodiments of the disclosed subject matter; and

FIG. 4 shows a block diagram of an apparatus, in accordance with someexemplary embodiments of the disclosed subject matter.

DETAILED DESCRIPTION

One technical problem dealt with by the disclosed subject matter isauthenticating an identity of a participant whose features areincorporated in digital media such as a video, an audio stream, or thelike. In some exemplary embodiments, digital media may be manipulatedusing one or more media fabrication technologies such as deepfake (aportmanteau of “deep learning” and “fake”) techniques. In some exemplaryembodiments, the digital media may be manipulated by replacing a firstperson with another person such as by superimposing a target person on avideo that originally depicted an original person, by superimposing avoice of one person on a given audio where a different, original, personis speaking, or the like. In some exemplary embodiments, it may bedesired to determine if a currently displayed person was originallycaptured in the video or whether he was superimposed on the video by amedia manipulator. Similarly to the above, it may be desired todetermine if a person's voice in an audio is authentic or was generated.

In some exemplary embodiments, manipulation techniques of digital mediamay be configured to fabricate audio and video, e.g., using artificialintelligence, deepfake techniques, or any other manipulation technique.In some cases, digital media may be fabricated utilizing a GenerativeAdversarial Network (GAN) that is configured to generate manipulatedmedia at a generative network, and to evaluate the generated media at adiscriminative network, e.g., thereby enhancing a level of thefabrication. In some exemplary embodiments, fabricated media may becreated using a plurality of alternative techniques, such as bygenerating a fake video of a participant based on one or more picturesof the participant, by creating synthetic media, or the like.

In some exemplary embodiments, media manipulations may includemanipulating Augmented Reality (AR) layers which may enable a user toswitch between different people captured in an image, to modify faces,switch between captured objects, or the like. In some cases, human imagesynthesis may be performed, e.g., to show victims saying or doing thingsthat they never said or did. In some exemplary embodiments, existingimages and videos may be combined or superimposed onto source images orvideos, thereby enabling to replace one participant with another,lip-sync a filmed person according to a determined audio, swap a certainfilmed face with another face, or the like.

In some exemplary embodiments, manipulated media may be misused, forexample, for creating fake news, theft, identity frauds, or the like.This may be done by depicting one or more people saying things orperforming actions that never occurred in reality, such as alteringwords or gestures of a figure to make it look like he said or didsomething which he has not.

Another technical problem dealt with by the disclosed subject matter isto eliminate digital fabrication attacks such deepfake attacks which maybe used for spoofing communications such as phone calls, videos, audiomessages, or the like. In some cases, an attack scenario may includespoofing a phone number to forge a phone call, an audio message, or avideo-based communication such as a video conference, for example, witha familiar person. As an example, a recipient may receive a forged phoneor video call from his boss, and while thinking it is actually his boss,the recipient may follow instructions provided from the manipulatorduring the forged phone call.

Yet another technical problem dealt with by the disclosed subject matteris to authenticate an identity of a participant in a real time mediastream, such as a real time telephone call, video call, or any otherdigital communication medium. In some exemplary embodiments, it may bedesired to determine whether or not real time communications aretampered with, or whether they are authentic. In some exemplaryembodiments, every individual participant may be characterized by one ormore unique behavioral patterns which may include unique face movements,gait, walking patterns, voice patterns, hand movements, or the like. Insome exemplary embodiments, manipulation techniques may typicallyreplace a first person with a second person by causing the second personto imitate behavioral patterns of the first person. In some exemplaryembodiments, manipulation techniques may not necessarily be able toaccurately imitate unique behavioral patterns of a person whose featuresare used to forge digital media.

For example, a video incorporating a woman named Alice may bemanipulated to replace Alice with a man named Bob, e.g., using deepfaketechniques or any other media manipulation technique. However, it may bedifficult to replace a body language of Alice with a body language ofBob, for example, since a body language of a participant may be unique,subtle and difficult to identify and even more so, to fabricate. Forexample, Bob may experience very frequent eye twitching, while Alice'seyes may twitch only on rare occasions. In some cases, a manipulatedvideo may depict Bob twitching his eyes in the same pattern as Alice, incontrary to his typical body language. In another example, Bob may limpand have walking patterns that are very different from Alice's walkingpatterns, but the manipulated video may depict Bob walking in a walkingpatterns that are similar to Alice's without limping. As anotherexample, the behavior of Alice may be characterized by performinggenerally symmetrical body gestures, while Bob may not. As a result, themanipulated video may depict Bob while performing symmetrical bodygestures that are uncharacteristically for him. In another example,Alice may stutter every time she says “apple” or starts a sentence.According to this example, a deepfake may produce a forged audio ofAlice by superimposing Alice's voice on an audio of Bob. However, theforged audio may not include the stuttering that is typical to Alice.

One technical solution provided by the disclosed subject matter may beauthenticating an identity of a participant whose features areincorporated in a digital media by using a personalizedcontext-sensitive model of the participant. In some exemplaryembodiments, a personalized model of the participant may be trained toidentify behavioral patterns of the participant with respect to acertain communication context, e.g., an identity of the participant, atype of relationship between the communicating people, a topic of thecommunication, or the like. In some exemplary embodiments, the contextmay relate to an interaction, a type of the interaction, a role of aparticipant in the interaction, or the like. In some exemplaryembodiments, different contexts of the communication associated with theparticipant may require a separate personalized model, e.g., which maybe trained on the same context. Additionally or alternatively, a singlepersonalized model may be configured to be context-sensitive, and may beutilized for a plurality of contexts. In some exemplary embodiments, thepersonalized model may include a machine learning model, a deep learningmodel, a classifier, a predictor, or the like.

In some exemplary embodiments, every participant may be characterizedwith unique behavioral patterns that may be expressed in correspondingsituations and scenarios. For example, a participant may becharacterized with a certain unique behavioral pattern when interactingwith children, which may differ from his behavioral pattern wheninteracting with his boss, his subordinates, his spouse, or the like. Insome cases, behavioral patterns of a same participant may differ intheir intonations, tones, assertiveness levels, body language,vocabulary, facial expressions, or the like, when interacting withdifferent entities or entity types.

In some exemplary embodiments, within a same context, behavioralpatterns of a person may be consistent. In some exemplary embodiments, atypical interaction of a person may be classified as belonging to one ormore common categories, such as an interaction with a boss, a spouse, achild, a co-worker, or the like. For example, an interaction of a personwith a child may be characterized by a certain body language that isunique to the person when communicating with children, e.g., certainvoice intonations, tones, or the like. An interaction of the same personwith co-workers may be characterized by a slightly different bodylanguage, voice intonations, tones, or the like. In some exemplaryembodiments, an interaction of a person may be classified according to aspecific identity of the person with which he is interacting. In someexemplary embodiments, within a same category of relationships,behavioral patterns of a person may vary. For example, the person mayinteract differently with different co-workers, although they may belongto a same category. In some exemplary embodiments, an interaction of aperson may be classified according to a specific conversation topic thatis discussed. In some exemplary embodiments, within a same interaction,behavioral patterns of a person may vary according to a topic oremotional state of the person.

In some exemplary embodiments, although many behavioral patterns of aperson may alter in different scenarios, other behavioral patterns ofthe same person may stay consistent regardless of the specific scenario.For example, a stutter of a person may be consistent in differentsituations and scenarios. As another example, some subconsciousbehaviors may be consistent in different contexts, as the person may notbe able to control them.

In some exemplary embodiments, based on unique behavioral patterns of aparticipant in a specific communication context, a personalized model ofthe participant in the context may be generated, created, or the like.In some exemplary embodiments, the personalized model of the participantmay be generated for a user communicating with the participant. In someexemplary embodiments, the personalized model of the participant may beconfigured to identify one or more behavioral patterns of theparticipant, and compare them with identified behavioral patterns of theparticipant in real time digital media. In some exemplary embodiments,the comparison may indicate whether the depicted participant is anauthentic version of himself, or whether the depicted participant is afabricated version of the participant.

In some exemplary embodiments, media manipulation or deepfake techniquesmay be able to utilize publicly available media of a public figure,e.g., the president, and train a personalized model of the presidentbased thereon, in order to create a fabricated media that matches thebehavioral patterns of the public figure as identified in the publicmedia. In some exemplary embodiments, the public figure may actdifferently in different contexts, so that the public media may not beaccurate for some non-public communication contexts of the same publicfigure. For example, a personalized model of the president that istrained on public speeches of the president may not be able to fabricateaccurate behavioral patterns of the president when speaking with hischild. In some exemplary embodiments, as media manipulation techniquesmay not have access to private media of the public figure, it may bemore difficult for forging entities to train a model to fabricate thepublic figure's behavioral patterns in non-public contexts ofcommunication.

In some exemplary embodiments, with respect to private people—there maynot be a large amount of publicly accessible media depicting theirbehavior, e.g., an amount that is large enough to train a personalizedmodel. In some exemplary embodiments, as media manipulation techniquesmay not have access to private media of private people in differentcommunication contexts, it may be difficult to train a fraud model tofabricate the private person's behavioral patterns in one or morecontexts of communication, e.g., in order to spoof a call. In someexemplary embodiments, communication media records of a person innon-public contexts may be typically available at a user devicecommunicating with the person. In some exemplary embodiments,personalized models may be trained to identify a person with which theuser is communicating in non-public contexts, based on media of theperson that is not publicly accessible.

In some exemplary embodiments, a media stream associated with aparticipant, such as a real-time media stream, may be obtained. In someexemplary embodiments, the media stream may be obtained from a mediasource such as a broadcast, a radio-based communication, a phone call, avideo call, a FACBOOK™ live stream, a YOUTUBE™ live stream, or the like.In some exemplary embodiments, the media stream may be captured ordisplayed at a computing device of a user, e.g., via an application ofthe device, a camera of the device, a browser of the device, or thelike. In some exemplary embodiments, the media stream may depict acommunication of the user with a participant in a specific communicationcontext, e.g., during an interaction of a certain type. In someexemplary embodiments, the media stream may or may not be publiclyaccessible, publicly available, or the like.

In some exemplary embodiments, the communication context of theparticipant, as depicted in the media stream, may be identified. In someexemplary embodiments, the communication context may be a context of theinteraction between the user and the participant. In some exemplaryembodiments, the communication context may comprise a friendshiprelationship, a co-working relationship, a romantic relationship, afamily relationship, a business relationship, a customer-clientrelationship, or the like. In some exemplary embodiments, thecommunication context may comprise the identity of the participant. Insome exemplary embodiments, the communication context may comprise anemotional state or topic of conversation. In some exemplary embodiments,the media stream may depict the communication between the user, theparticipant and any additional participants.

In some exemplary embodiments, the communication context may beidentified based on identities of participants of the interaction suchas an identity of the participant in the communication, based on amanner of referring to each participant, based on attributes ofinteraction such as a level of formality, based on classifications ofthe interaction, or the like. In some exemplary embodiments, theidentity of the participant may be determined based on a facialrecognition method implemented on the media stream, an audio recognitionmethod implemented on the real-time media stream, tags relating to theparticipant that are attached to the real-time media stream, or thelike.

In some exemplary embodiments, a personalized model of the participant,when communicating in the communication context, may be obtained. Insome exemplary embodiments, the personalized model may be configured toidentify one or more behavioral patterns of the participant, which maybe matched against behavioral patterns of the real time communication.In some exemplary embodiments, the behavioral patterns of theparticipant may comprise face movements of the participant, facegestures of the participant, a gait of the participant, a walkingpattern of the participant, hand movements of the participant,frequently used phrases of the participant, a talking manner of theparticipant, a voice pattern of the participant, or the like.

In some exemplary embodiments, obtaining the personalized model mayinclude obtaining a private model generated based on past communicationsor interactions such as private or non-public communications between theuser and the participant (e.g., video conferences between the user andthe participant in the past; phone calls between the user and theparticipant; or the like), a category of the participant, or the like.In some exemplary embodiments, the media stream may depict a real-timecommunication of the user with the participant, which may be identifiedas the communication context. In some exemplary embodiments, the pastcommunications, as well as the private model, may not be publiclyaccessible, e.g., as it may include communications of the participantand the user, which may be sensitive by nature. In some exemplaryembodiments, one or more personalized models in the possession of theuser may be examined to identify therein a personalized model of theparticipant that matches the identified context. In some exemplaryembodiments, existing personalized models belonging to the user may bescored against the identified context, and the personalized model thatscored highest may be selected. In some exemplary embodiments, thematching model may be retrieved, obtained, or the like.

In some exemplary embodiments, in case all the existing personalizedmodels belonging to the user have a matching score that is below amatching threshold, it may be determined that the participant does nothave a trained personalized model that matches the identified context.In some exemplary embodiments, in case the participant does not have atrained model that matches the identified context, the existing one ormore personalized models may not be retrieved, utilized, or the like. Insome exemplary embodiments, in case the participant does not have atrained model that matches the identified context, but does have one ormore trained models in a different context, the highest similarityscored model may be selected and retrieved therefrom. Additionally oralternatively, in case the participant does not have a trained modelthat matches the identified context, a general model of theparticipant's behavioral patterns that is not associated to a specificcontext may be utilized.

In some exemplary embodiments, the personalized model may be executed,processed, or the like, on at least a portion of the real-time mediastream, e.g., in order to determine whether a behavioral pattern of theparticipant in the real-time media stream matches the behavioral patternof the participant according to the personalized model. In someexemplary embodiments, the personalized model may be trained to indicatewhether or not behavioral patterns of a participant matches thebehavioral pattern of the participant in a communication stream.

In some exemplary embodiments, upon identifying a mismatch between thebehavioral pattern of the participant in the real-time media stream andthe behavioral pattern of the participant according to the personalizedmodel, a responsive action may be performed. In some exemplaryembodiments, the mismatch may be identified, e.g., at the personalizedmodel, by determining that a difference between the behavioral patternof the participant in the real-time media stream and the behavioralpattern of the participant according to the personalized model exceeds athreshold. In some exemplary embodiments, the responsive action maycomprise generating an alert, a notification, or the like, e.g.,indicating that the real-time media stream is forged. In some exemplaryembodiments, the responsive action may comprise blocking the real timecommunication.

In some exemplary embodiments, the personalized model may comprise aclassifier that is trained on a dataset. In some exemplary embodiments,the dataset may comprise media records depicting communications of theuser with the participant in the communication context. In someexemplary embodiments, the dataset may comprise a first class of mediaand a second class of media. In some exemplary embodiments, the firstclass may comprise media records originally depicting the participant ina communication context. In some exemplary embodiments, the second classmay comprise media records originally depicting other people excludingthe participant in the communication context or in other communicationcontexts. In some exemplary embodiments, the personalized model may betrained to classify media as belonging to the first class or to thesecond class.

In some exemplary embodiments, deepfake or any other media manipulationtechniques may be implemented on the first class, thereby obtainingprocessed records of the participant. In some exemplary embodiments, themanipulation techniques may be configured to replace the participantwith different people excluding the participant, e.g., thereby creatingforged media of the participant. In some exemplary embodiments, themanipulation techniques may be configured to replace the participantwith the same participant, e.g., using a different image of theparticipant.

In some exemplary embodiments, manipulation techniques may beimplemented on the second class, thereby obtaining forged records of theother people. In some exemplary embodiments, the manipulation techniquesmay be configured to replace the other people, e.g., with differentpeople, with each other, with the participant, or the like. In someexemplary embodiments, the participant may be superimposed the over atleast some of the other people. In some exemplary embodiments, theforged records of the participant may be added to the first class, whilethe processed forged of the other people may be added to the secondclass. This way, the classifier may be forced to identify the behavioralpatterns of the participant.

In some exemplary embodiments, different personalized models fordifferent participants may be trained to identify the participants incorresponding communications with the user. In some exemplaryembodiments, a user may retain a personalized model for one or moreparticipants with which he communicates, and each such participant maybe considered to be a separate communication context. For example,communications of the user with a first person may be used to train apersonalized model of the first person, and communications of the userwith a second person may be used to train a personalized model of thesecond person. Based on the first and second models, any incoming callor other communication with the first or second person may beauthenticated.

In some exemplary embodiments, different personalized models of the sameparticipant may be trained to identify the participant in correspondingcommunication contexts, e.g., using corresponding datasets. In someexemplary embodiments, a first personalized model of the participant maybe trained under a first communication context, a second personalizedmodel of the same participant may be trained under a secondcommunication context, and so on. For example, a first personalizedmodel of the participant may be trained for interactions relating tomonetary issues, while a second personalized model of the participantmay be trained for interactions relating to family issues.

In some exemplary embodiments, a communication system of the user may beused to communicate between the user and the participant. In someexemplary embodiments, the communication system may be configured toretain communications between the user and the participant. In someexemplary embodiments, the communication system may be configured togenerate a private or personalized model of the participant in thecommunication context, e.g., based on the retained communicationsbetween the user and the participant. In some exemplary embodiments, oneor more aspects of the current subject matter may be implemented by thecommunication system.

One technical effect of utilizing the disclosed subject matter may be toauthenticate digital media such as digital audio, videos, or the like,in a communication context, as being authentic or manipulated. In someexemplary embodiments, a classification of digital media as authentic ormanipulated may be provided for any digital media stream incorporatingfeatures of a participant interacting in a communication context. Insome exemplary embodiments, the disclosed subject matter enables todetermine if a currently displayed participant was originally capturedin the video or whether he was superimposed on the video by amanipulator. The disclosed subject matter enables to authenticate anidentity of a participant in a real time media stream, such as a realtime telephone call, video call, or any other digital communicationmedium, as well as in any other media file.

Another technical effect of utilizing the disclosed subject matter is toblock media fabrication attacks used for spoofing communications such asphone calls, videos, audio messages, or the like. In some cases, thedisclosed subject matter may be utilized to classify, in real time,whether incoming calls or video-based communications are authentic.

One technical problem dealt with by the disclosed subject matter is toauthenticate that an identity of a participant is not being exploited indigital media. In some cases, it may be desired to monitor a digitalmedia platform or network to identify media that portrays a person ofinterest, and authenticate it. In some cases, it may be desired to knowwhether or not malicious entities are exploiting the person's identityin one or more forged videos or audios.

Another technical problem dealt with by the disclosed subject matter isto limit GAN techniques (also referred to as “coevolution techniques”)which may be used by a forging person or entity to improve a mediafabrication capability. In some exemplary embodiments, coevolutiontechniques may be used as part of a technological “arm race” betweendeepfake creation techniques and deepfake detection techniques. In someexemplary embodiments, using coevolution or GAN techniques, as well asclassification results of the personalized model, the forging person maybe able to train his GAN framework to learn how to authenticate fraudmedia and bypass dedicated detectors such as the personalized model. Insome cases, GAN techniques may be used to learn how to generate a fraudmedia file that overpasses the personalized model and is authenticatedthereby without being detected as fraud. In some exemplary embodiments,a forging entity may train a GAN framework to utilize classificationresults of the personalized model for enhancing its ability to not to bespotted by the personalized model. In some exemplary embodiments, theGAN techniques may utilize classification results of the personalizedmodel, which may be used as a deepfake detector, to learn how to avoidbeing spotted by the detector. For example, the GAN framework may createa plurality of different deepfake videos or audios and score themaccording to classification results from the personalized model, e.g.,which may be provided by directly accessing the personalized model or byanalyzing responsive actions.

One technical solution provided by the disclosed subject matter may beto continuously or intermittently monitor digital media of communicationmediums via social media sites in order to detect fraud media thatdepicts a person of interest. In order to reduce a risk of beingexploited by one or more GAN frameworks, an access to the personalizedmodel may be limited, e.g., by placing the personalized model and itsassociated dataset in a secure location, or by securing the personalizedmodel in any other manner, thereby limiting any direct usage of thepersonalized model. In some exemplary embodiments, the personalizedmodel may be kept privately without being publicly accessible oravailable to untrusted third parties. Accordingly, it may be moredifficult for a forging entity, that may wish to forge media, to use GANframeworks to comply with the authentication requirements of thepersonalized model.

Additionally or alternatively, the personalized model may limit itsscope to communication mediums, thereby reducing the amount of availableclassified media. In some exemplary embodiments, the personalized modelmay only classify real time communications such as telephoneconversations, video calls, or the like, without attempting toauthenticate uploaded media.

Additionally or alternatively, the personalized model may limit itsscope to social media platforms. In some exemplary embodiments, thepersonalized model may evaluate viral media only in social mediaplatforms that comply with a popularity criteria, media shared bynon-suspicious profiles, or the like. In some cases, fraud users insocial media sites may be identified by detecting a source of shareddeepfake media, reporting suspicious profiles to corresponding socialmedia sites, or the like, and may not be considered when determiningcompliance with the popularity criteria.

One technical effect of utilizing the disclosed subject matter is toverify that an identity of a participant is not being exploited indigital media. In some cases, a network may be monitored to find mediathat portray a person of interest, and the disclosed subject matter maybe utilized to authenticate the media using the personalized model ofthe person.

Another technical effect of utilizing the disclosed subject matter islimiting GAN techniques that are used to improve a media fabricationcapability. In some exemplary embodiments, as the disclosed subjectmatter secures the personalized model and limits a scope ofclassifications thereof, the GAN techniques may not have sufficientaccess to results of the personalized model, and therefore may notutilize classification of the results to improve a forging capability.

Referring now to FIG. 1 showing an illustration of a computerizedenvironment, in accordance with some exemplary embodiments of thedisclosed subject matter.

In some exemplary embodiments, Environment 100 may comprise a Device 110of a user. Device 110 may be a smartphone, a smartwatch, a tablet, aPersonal Computer (PC) or the like. Device 110 may comprise an OperatingSystem (OS), a processor, a receiver, a transmitter, a memory, a networkinterface, or the like. Device 110 may be used for displaying,capturing, obtaining, or the like, digital media streams such as audiocommunications, video communications, voice messages, or the like, andauthenticating them.

In some exemplary embodiments, Media Source 120 may be a digital mediaprovider of Device 110 such as a server providing media such as radio ortelevision broadcasting to Device 110, a displaying screen displayingmedia that can be captured by Device 110, or the like. Device 110 may beconnected to a network, for example, through a BLUETOOTH™ connection, aWIFI™ connection, a local cellular connection, a Local Area Network(LAN), a Wide Area Network (WAN), or the like, and may obtain thedigital media from a server via the network. In some exemplaryembodiments, one or more applications, browsers, or the like of Device110 may obtain the media stream from the server. In some exemplaryembodiments, Media Source 120 may be a digital media displayer such astelevision screen which may be captured by Device 110, e.g., via acamera of Device 110. In some exemplary embodiments, the digital mediamay depict a communication between the user and Participant 150.

In some exemplary embodiments, Device 110 of the user may include aPersonalized Model 130, which may be a private model of Participant 150communicating with the user under a communication context. For example,Personalized Model 130 may include a model of Participant 150 with whichthe user of Device 110 is communicating, e.g., a friend of the user, aco-worker of the user, a spouse of the user, a family member of theuser, or the like. In some exemplary embodiments, Personalized Model 130may be configured to authenticate Participant 150 in the communicationcontext, e.g., in real time, as the authentic participant or as a fraud.For example, upon receiving at Device 110 an audio or video call fromParticipant 150 which may be a father of the user, a Personalized Model130 of Participant 150 in the communication context of the interactionmay be executed on at least a portion of the call, in order toauthenticate the identity of Participant 150. As an example, this mayprevent a deepfake attack trying to forge Participant 150's identity inorder to rob the user.

In some exemplary embodiments, Personalized Model 130 may be located atDevice 110. Alternatively, Personalized Model 130 may be located at aserver, a cloud network, or the like, and may be accessible to Device110 in order to enable Device 110 to authenticate digital media of theparticipant in the certain communication context. In some exemplaryembodiments, Personalized Model 130 may be trained on a Dataset 140,which may include digital media of Participant 150 in the communicationcontext. In some exemplary embodiments, the Dataset 140 may include, foreach model of a participant, previous interactions of the participantwith the user of Device 110 in one or more communication contexts.

Referring now to FIG. 2 illustrating a flowchart diagram of a method, inaccordance with some exemplary embodiments of the disclosed subjectmatter.

On Step 210, a media stream associated with a participant may beobtained, e.g., in real-time. In some exemplary embodiments, the mediastream may depict a real time communication of a user with theparticipant in a communication context. In some exemplary embodiments,the digital media may be obtained, for example, from a social mediaplatform, a communication medium, or the like. For example, the digitalfile may be obtained during a real time communication from a video chat,a video conference, a phone call, a SKYPE™ session, or the like.

In some exemplary embodiments, the media stream may be identified asbeing associated with the participant based on identified features ofthe participant, a tag of the participant, metadata of the media streamsuch as a retained name of the participant, a combination thereof, orthe like. For example, the metadata of the media stream may include apredefined name of the device with which the user is communicating,which may be assigned or defined by the user or by any other entity. Insome exemplary embodiments, identified features of the participant maybe identified using one or more facial recognition methods, audiorecognition methods, or the like.

In some exemplary embodiments, the media stream may depict the userinteracting with the participant, with a category of people includingthe participant, in a certain type of interaction with the participant,or the like. For example, the media stream may include an audio of aphone call between the user and his best friend (e.g., the participant),or an audio of a phone call between the user and a child of any identity(e.g., a category of the participant). In some exemplary embodiments,the participant may include a participant of a known, specific, and/orpredetermined identity or category. In some exemplary embodiments, theparticipant may be part of a category of people such as people that arechildren, people that are friends of the user, people that are loud,customer service employees, or any other category of people that isdefined as being dependent on the user or that is defined independentlyfrom the user. In some exemplary embodiments, a certain type ofinteraction may include, e.g., an interaction that is associated withbusiness matters, an interaction that is associated with family matters,a calm interaction, a dispute, or the like.

On Step 220, the communication context of the participant may beidentified. In some exemplary embodiments, the communication context maycomprise an identity of the participant, a type of relationship betweenthe participant and the user such as a friendship relationship, aco-working relationship, a romantic relationship, a family relationship,a business relationship, a customer-client relationship, or the like, atype of interaction between the participant and the user, or the like.In some cases, Step 210 may be implemented before or after Step 220. Forexample, digital media may be obtained based on an identification of acommunication context of the participant therein, or vice versa, e.g.,after obtaining the digital media, the communication context may beidentified.

In some exemplary embodiments, one or more participants with which theuser is interacting in the digital media may be identified using one ormore facial recognition methods, audio recognition methods, tags,references, metadata, or any other digital media recognition technique.In some exemplary embodiments, a type of interaction may be identifiedbased on one or more classifiers, tags, references, or the like. In someexemplary embodiments, several participants may be identified in thesame digital media, such as in different segments thereof, differentvisible portions of the same segment, or the like. In some exemplaryembodiments, steps 230-250 may be applied with respect to eachidentified participant in the digital media, with which the user isinteracting.

On Step 230, a personalized model of the participant in communicationwith the user, e.g., in the communication context, may be obtained. Insome exemplary embodiments, the personalized model may be configured tocompare one or more typical behavioral patterns of the participant withidentified patterns of the participant in digital media streams. In somecases, the personalized model may be trained to identify movementpatterns of the participant such as a certain gait of the participant,how the participant moves, how the participant's face moves, frequentlyused phrases, typical face gestures, a symmetrical level of facegestures, how the participant talks, or the like.

In some exemplary embodiments, the personalized model of the participantmay be retrieved upon identifying a corresponding personalized model inthe corresponding context. In some exemplary embodiments, in case thepersonalized model, which may be retained at a user's possession, isdetermined to match the participant as well as the context of the mediastream, the media stream or portions thereof may be provided to thepersonalized model for classification, e.g., in order to determinewhether or not the digital media is authentic. In some exemplaryembodiments, the personalized model of the participant may be retrievedupon determining that the user's device has a corresponding personalizedmodel. For example, the context may include a type of a communicationsession between the user and the participant, an identity of theparticipant, a type of an identity of the participant, a type of digitalmedia, a topic of communication that is associated with the digitalmedia, or the like.

In some exemplary embodiments, the media stream may include a media filedepicting the participant in a plurality of contexts, some of whichbeing communication contexts for which a personalized model exists. Insome cases, a media such as a video may depict the participantperforming one or more activities and communications, which may includea communication having a communication context for which a model of theparticipant exists. In such cases, the portions of the video associatedto the context may be extracted and authenticated using the model. Forexample, a model of the participant when communicating with children maybe retained, and the video may include one or more portions in which theparticipant communicates with children. Such portions may be used toauthenticate the identity of the participant, e.g., as forging entitiesmay not have access to sufficient data depicting the participantcommunicating with children.

In some exemplary embodiments, in case the digital media is determinedto have a different context from the communication context of thepersonalized model, the difference may be scored and compared to acontext threshold, in order to determine whether the personalized modelmay be utilized for classifying the media. In some exemplaryembodiments, a difference between communication contexts may relate to acommunication medium, an identity of participant to the communication,an identity type of participants to the communication, a topic type ofthe communication such as standup or politics, or the like. For example,in case the communication context includes a communication relating tobusiness topics, and the digital media depicts a communication relatingto chores, the difference may or may not comply with the threshold. Insome exemplary embodiments, a semantic classifier may be performed toidentify a topic of an interaction, or a mood related to theinteraction. In some exemplary embodiments, in case a personalized modelmatching the digital media is not detected, or if a matchingpersonalized model has an inaccurate or incorrect context that does notcomply with the context threshold, the digital media may be disregardedwithout being authenticated. Otherwise, the method may continue to Step240.

In some exemplary embodiments, a private personalized model of theparticipant may be trained based on a dataset including a plurality ofrecords of communications under the certain communication context. Insome exemplary embodiments, the dataset may retain previouscommunications between the user and the participant, as well as digitalmedia associated with the participant under a certain context. Forexample, a personalized model of communication between co-workers may betrained over a dataset including a plurality of records ofcommunications between the co-workers. In some exemplary embodiments,the plurality of records of digital media may be accumulated byrecording any such communication, e.g., via a camera, a Virtual Reality(VR) headset, Augmented Reality (AR) glasses, a microphone, or the like.For example, the personalized model of communications between theco-workers may include audio records of real life face-to-faceconversations between the co-workers, telephone conversations betweenthem, video conversations between them, or the like. In some exemplaryembodiments, the plurality of records may include a stream of digitalmedia captured by a user's Internet of Things (IoT) camera, his device'scamera, an independent camera, or the like. For example, the user'scamera may capture a SKYPE™ communication that is displayed on theuser's computer screen. In some exemplary embodiments, the dataset mayinclude accessible private or public media, retained previouscommunications with the participant, privately or publicly accessibledatabases of media, or the like, which may be captured under a certaincommunication context. In some cases, upon reaching a satisfyingthreshold of a dataset size, the dataset may be considered large enoughfor training a personalized model. In some exemplary embodiments, thepersonalized model may be implemented by a machine learning model, orany other type of classifier, which may be configured to identifybehavioral patterns of a participant in a communication context.

For example, in case the user is Alice and the participant is Charlie, apersonalized model of Charlie may be produced by Alice based on recordedcommunications between Alice and Charlie. Additionally, oralternatively, the personalized model of Charlie may be trained by Alicebased on recorded communications between Alice and other people of thecategory of Charlie, e.g., having a similar relationship to Alice thanCharlie. For example, in case Charlie is a brother of Alice, apersonalized model of all of Alice's brothers including Charlie may betrained based on all of their mutual communications with Alice.

In some exemplary embodiments, at least some authenticated communicationsessions may be used for training the personalized model. In someexemplary embodiments, recorded communications may be consideredauthenticated if they were recorded up to a certain period of time, suchas up to a month ago, up to 3 weeks ago, or the like, and only if suchrecorded communications were not reported by the user or the participantas manipulated. In other cases, any recorded communications may beconsidered authenticated if they were not reported by the user as fraud.In some exemplary embodiments, recordings of a user communicating with athird party that are indicated by the user as fake, may be consideredauthentic, even if the third party denies such labeling. In someexemplary embodiments, audio or video recordings of actual real lifemeetings may be automatically considered authentic. In some exemplaryembodiments, some communications or other digital media may beauthenticated using automated techniques, such as based on cryptographicsignature, using secured communication devices, or the like.

In some exemplary embodiments, the personalized model may be trained ona dataset including a first class including authentic media of theparticipant in scenarios associated with a communication context, or amodified version thereof. In some exemplary embodiments, thepersonalized model may be trained on a second class including media ofdifferent people, e.g., different from the participant, in the samecontext, or in a different context, which may be processed or be in anoriginal form. In some exemplary embodiments, the personalized model maybe trained to classify one or more media streams to determine whetherthey belong to the first or second class. In case the personalized modelindicates that a media stream belongs to the first class, it may bedetermined to be an authentic media of the participant. In case thepersonalized model indicates that a media stream belongs to the secondclass, it may be determined to be a manipulated or deepfake media.

In some exemplary embodiments, the first and second classed may beobtained, generated, or the like, and utilized as a tagged dataset ofthe personalized model. In some exemplary embodiments, in order tocreate the first class for the dataset, deepfake techniques or othermedia fabrication techniques may be implemented on at least some of theauthentic media of the participant to impersonate a plurality ofdifferent people, which may preserve the behavioral patterns of theparticipant under the context. In some exemplary embodiments, forgenerating the second class, deepfake techniques or other mediafabrication techniques may be implemented on at least some of the mediaof the different people excluding the participant to impersonate anyother people thereon, thereby preserving the behavioral patterns of theother people.

In some exemplary embodiments, the first class may include the modifiedauthentic media of the participant, as well as unmodified authenticvideos of the participant. In some exemplary embodiments, the firstclass may comprise fabricated media that is generated based on theauthentic media of the participant, e.g., by superimposing differentpeople on the participant, possibly including himself. Additionally, oralternatively, the first class may comprise fabricated media that isgenerated by two consecutive manipulations, such as replacing theparticipant by another person and then re-replacing the other personwith the original participant. In such a case, the media may bemanipulated media that exhibit manipulations associated with theutilization of the deepfake technology but still exhibit originalconduct and behavior of the target person. Additionally, oralternatively, the first class of media may comprise fabricated mediathat is generated using a plurality of different alternative deepfakeengines, such as may be available for public use. As a result, the firstclass may comprise at least some deepfake media that is generated usingdifferent techniques.

In some exemplary embodiments, the second class in the dataset may beobtained by implementing media fabrication techniques on mediaoriginally depicting people excluding the participant. In some cases,the media may capture a plurality of people excluding the participant,e.g., under the context, under any other second context, or the like. Insome exemplary embodiments, media fabrication techniques may be appliedto at least some of the media files to impersonate the participantthereover, to impersonate a group of other people thereover, or thelike.

In some exemplary embodiments, each video or audio in the dataset may belabeled as belonging to the first or second class. In some exemplaryembodiments, each media file of the first class that originally depictedthe participant may be tagged or labeled as true, while each media ofthe second class that originally did not depict the participant may betagged or labeled as false. In some cases, each modified or unmodifiedauthentic media of the participant may be labeled as true, e.g., sinceit may depict true behavioral patterns of the participant under thecontext, even if not depicting accurately his face, body, voice, or thelike. In some exemplary embodiments, media of the second class may belabeled as false, e.g., since it may not depict true behavioral patternsof the participant.

In some exemplary embodiments, during a training phase, the machinelearning model may learn to separate the two classes by distinguishingwhat makes the participant the “real” participant, for example, based onidentifying behavioral properties of the participant under the contextwhich may not be successfully incorporated in fabricated versions of theparticipant. This way, the classifier may be forced to identify thebehavioral patterns of the participant himself, without being able torely on other information such as facial features or deepfakesignatures. In some exemplary embodiments, in case the machine learningmodel will try to distinguish between the first and second classes byidentifying specific deepfake effects that were used such asinconsistent boundaries or problems in specific patches, this may notwork out since both classes may depict similar deepfake techniques. Insome exemplary embodiments, in case the machine learning model will tryto distinguish between the first and second classes based on facialrecognition methods, its effort may not yield success since identicalfaces (e.g., the face of the participant, the faces of the predeterminedpeople, or the like) may be featured in both classes.

In some exemplary embodiments, behavioral properties of a participantthat may be utilized by the machine learning model to distinguish theparticipant from deepfake impersonations may include language propertiessuch as words that are frequently used, a combination of words that isfrequently used, frequent intonation patterns, grammar that is typicallyused, a certain structure of language, or the like. In some exemplaryembodiments, behavioral properties of the participant that may beidentified may include body behaviors such as certain facial movements,a certain gait, certain eye movements, a symmetry level of bodymovements, a relation between facial movements and content of a speech,a relation between facial movements and a corresponding intonation ofspeech, a relation between facial movement and hand movement, a relationbetween content of speech or intonation thereof and hand movement,combination thereof, or the like. In some exemplary embodiments, themachine learning model may learn any additional or alternativebehavioral property that may be useful for distinguishing the first andsecond classes.

In some exemplary embodiments, more than one different personalizedmodel may be trained for a same person, e.g., based on differentcontexts, relationships, or the like. For example, Alice may be a friendof Bob and a co-worker of Charlie. In such a case, Charlie may retain apersonalized model of his co-worker Alice to determine whether or notphone calls or videos chats that are allegedly initiated by Alice havenot been forged. At the same time, Bob may retain a differentpersonalized model of his friend Alice, which is the same person, todetermine whether or not phone calls or videos calls that are allegedlyinitiated by Alice have not been forged. In some cases, a context ofinteractions between Charlie and Alice may be characterized by aco-working relationship, while a context of interactions between Bob andAlice may be characterized by a causal friendship relationship.According to this example, the model of Alice used by Charlie may betrained on a dataset including recorded video and audio communicationsbetween Charlie and Alice, while the personalized model of Alice used byBob may be trained on a dataset including recorded video and audiocommunications between Bob and Alice. Accordingly, behavioral patternsof the personalized model of Alice retained by Charlie may be at leastslightly different than behavioral patterns of the personalized model ofAlice retained by Bob, although they both depict unique behavioralpatterns of Alice. For example, the personalized model retained byCharlie may be characterized by characteristics of a formal bodylanguage of Alice while the personalized model retained by Dave may becharacterized by Alice's informal behavior.

On Step 240, the personalized model may be executed on at least aportion of the media stream to determine whether a behavioral pattern ofthe participant in the real-time media stream matches the behavioralpattern of the participant according to the personalized model. In someexemplary embodiments, the media stream may be inspected in real time,in order to provide a real time classification of the media stream.

In some exemplary embodiments, the stream may be monitored to identifyportions depicting the participant under the communication context. Insome cases, the portions may be identified using facial recognitionmethods, audio recognition methods, tags relating to the participant,references relating to the participant, or any other digital mediarecognition technique. In some exemplary embodiments, the portions of acontinuous stream of media that depict a certain communication orcontext may be cropped. In some exemplary embodiments, patches depictingthe participant's features may be extracted or cropped from the digitalmedia. In some exemplary embodiments, communication sessions may includeremaining portions that are irrelevant for a model, such as remainingportions that do not depict a communication between the user and theparticipant, remaining portions that do not depict the participant'sfeatures, or the like. For example, the digital media may include avideo, and the person's features may be associated with a human objectin one or more frames of the video. The associated human object, orframes depicting the associated human object, may be extracted from thedigital media, e.g., to provide an extracted portion. In anotherexample, the digital media may include an audio session, and theperson's features may include his identified voice in the audio. In somecases, one or more sessions with the identified voice may be extractedfrom the audio stream and kept separately, e.g., to provide an extractedportion. In some cases, media may be cropped or processed using a facialrecognition technique, an audio recognition technique, a manual crop orremoval, or the like. In some cases, a classifier may be configured tocluster communications according to their contexts, type ofcommunication, or the like.

As an example, Alice may wish to train a personalized model foridentifying authentic communications with Bob using a video conferencecalls in which both of them participated. In such a case, Alice mayutilize a face recognition technique, or manually crop the recordedvideo conference by removing portions in which Alice and Bob are notcommunicating with each other, in which Bob is not depicted, or portionsin which Alice and Bob are not communicating in a specific interactioncontext. In some exemplary embodiments, the cropped portions may be usedto train the personalized model of Bob, may be added to the dataset, orthe like.

In some exemplary embodiments, the participant may be associated with ahuman object in one or more frames of a video stream. In some exemplaryembodiments, the associated human object, or frames depicting theassociated human object, may be extracted from the detected digitalmedia. In some exemplary embodiments, portions of the digital media thatfeature a voice of the participant may be extracted from the audio. Asanother example, in a video conference call of four participants, thescreen may be split into four corresponding regions each of which isassociated with a different participant. In some exemplary embodiments,a personalized model of a participant may be executed or otherwiseapplied on the portion of the frame that is associated with theparticipant, such as the top-right quarter of the frame. Additionally,or alternatively, the display of the video conference call mayintermittently switch between a split-screen mode and a main screenmode, such as showing a single participant (e.g., a participant who isconsidered active, who is speaking, or the like). The personalized modelof the participant may be executed on frame segments in the split-screenmode that are associated with the participant, as well on main-screenmode frames in which the participant is the active participant beingdisplayed. Additionally, or alternatively, the audio of the videoconference may be segmented based on the identity of the currentspeaker, and the audio of the participant may be cropped and provided tothe personalized model for further classification as being manipulatedor authentic.

In some exemplary embodiments, two or more different personalized modelsof respective people may be utilized during a same communicationsession. For example, a communication session may include a videoconference with four participants. A first participant may retain twopersonalized models matching a communication with two participants ofthe video conference. Accordingly, both personalized models may beapplied to extracted video streams depicting the two participants,respectively, to determine, e.g., at the first participant's device,whether or not they are forged. In another example, an audio record mayinclude a conversation between five people, a first participant thereofhaving three personalized models that match three of the remainingparticipants. The matching personalized models may be applied to audiopatches depicting the corresponding participant, to determine whetherthe participant's input (e.g., his voice) is forged. In some exemplaryembodiments, in case some of the personalized models do not match thecommunication context, those personalized models may not be utilized.

In some exemplary embodiments, an authenticity of the media stream maybe determined, e.g., by executing the personalized model of theparticipant on cropped portions of the media stream, on the entire mediastream, or the like. In some exemplary embodiments, In some exemplaryembodiments, the personalized model of each participant may classify themedia stream as belonging to the first class or to the second class,thereby determining whether the identified patterns of the participantmatch his typical behavioral patterns. In some exemplary embodiments, aclassification of a portion as belonging to the first class may indicatethat the digital media entity is authentic, while a classification ofthe portion as belonging to the second class may indicate that theportion is not authentic and has been tampered with. In case one or moreportions of the media stream are classified as authentic, and one ormore other portions of the media stream are classified as fraud, theentire media stream may be considered to be manipulated, tagged as such,or the like.

In some exemplary embodiments, a fabricated media stream of theparticipant in the communication context may violate the personalizedmodel of the participant in the communication context, while anauthentic media stream of the participant in the communication contextmay comply with the personalized model of the participant. In someexemplary embodiments, malicious or fabricating identities may be forcedto efficiently imitate a behavioral pattern of the participant under thecommunication context in order to overpass the personalized model'sclassification, which may a be very challenging requirement thatrequires a large quantity of personal communications of the participantin the context. This may extremely reduce a possibility of producingfraud media of the participant that complies with the participant'spersonalized model.

For example, the personalized model may determine whether or not theparticipant depicted in the digital media has movements and/orbehavioral patterns that correspond to the participant's typicalbehavioral patterns during a similar session or context, e.g., asidentified by the personalized model. In some cases, in case themovements or behavioral patterns that are identified in the digitalmedia are classified by the personalized model as not matching theparticipant, the digital media may be determined to be forged. Forexample, it may be determined that the participant's features such asface, body, or voice has been superimposed on a media stream using oneor more media fabrication or deepfake technologies. In some cases, incase the movements or behavioral patterns that are identified in thedigital media are classified by the personalized model as matching theperson, the digital media may be determined to be authentic.

On Step 250, upon identifying a mismatch between the behavioral patternof the participant in the real-time media stream and the behavioralpattern of the participant according to the personalized model, aresponsive action may be performed. In some exemplary embodiments, themismatch may be identified by the personalized model. In some exemplaryembodiments, the responsive action may include an alert indicating thecommunication is forged at least in part, a notification thecommunication is forged at least in part, blocking of an ongoingcommunication, or the like.

In some exemplary embodiments, at the end of the communication, the usermay obtain a verification indication from the user, indicating whetheror not the personalized model's classification was accurate. In someexemplary embodiments, the verification indication may be obtained in anauthenticated manner, such as in a real life meeting with theparticipant, or using a secured medium such as a cryptographicsignature, a personal password, using secured communication devices, orthe like. In some exemplary embodiments, the verification indication maybe used to tag the communication as correctly classified or incorrectlyclassified, which may be used for enhancing the personalized model byfurther training.

Referring now to FIG. 3 illustrating a flowchart diagram of a method, inaccordance with some exemplary embodiments of the disclosed subjectmatter.

On Step 310, forging capabilities of malicious entities or any otherentity may be reduced using one or more of Steps 320-340. In someexemplary embodiments, a forging entity may train a classifier such asutilizing a GAN framework to utilize classification results of thepersonalized model for enhancing its ability to be unspotted by thepersonalized model. For example, the GAN framework may create aplurality of different deepfake videos or audios and score themaccording to classification results from the personalized model, e.g.,which may be provided by directly accessing the personalized model or byanalyzing which media is detected as fraud based on detected responsiveactions. For example, in case fabricated media is removed from a socialnetwork, the GAN framework may monitor its fabricated media to see whichones are removed and which ones remain in the network, and use theresults to enhance its capabilities.

On Step 320, access to a personalized model of a participant may be atleast partially limited. In some exemplary embodiments, for the GANframework to be able to learn how to overcome the personalized model, itmay utilize a direct access to the personalized model in order to obtainclassification results therefrom. In some exemplary embodiments, inorder to avoid such usage of the personalized model, the personalizedmodel may be located at a secure location such as a remote server, aconfidential server, a secure node or device, or at any other locationthat does not provide public access and is not easily broken into. Insome exemplary embodiments, the personalized model may be kept at anon-secure location that is secured using one or more securitymechanisms such as passwords, keys, or the like. In some exemplaryembodiments, keeping the personalized model at a secure location or in asecure manner may reduce a risk of being exploited by one or more GANframeworks or any other media fabrication technique.

In some exemplary embodiments, in addition to the personalized modelitself, any data used for its training, e.g., an initial dataset and anyaddition thereto, such as real time communications, may be kept securelyin the same secure location as the personalized model, or in a differentsecure location that is accessible to the personalized model. In someexemplary embodiments, the data may be retained using securitymechanisms that correspond to those used for the personalized model, orany other security mechanisms.

On Step 330, the personalized model may limit its scope to communicationmediums alone, e.g., to further reduce a risk of being exploited by oneor more GAN frameworks. In some exemplary embodiments, the personalizedmodel may be configured to continuously or intermittently monitor, e.g.,at its secure location, digital media of communication mediums such asFACEBOOK™ video calls, or via any other medium. In some exemplaryembodiments, the monitored communication mediums may be monitored formedia that depicts one or more people, e.g., as determined by the user.For example, a user may train a personalized model of himself toidentify his behavioral patterns, and monitor communication mediumsdepicting himself, in order to apply the model thereon. From the pointof view of a forging entity, it may not be feasible to train a networkon classification results from real time communication mediums alone,e.g., since such efforts may be easily spotted and blocked. In othercases, the personalized model may not limit its scope to certainmediums.

On Step 340, the digital media may limit its scope to social mediaplatforms, e.g., to further reduce a risk of being exploited. In someexemplary embodiments, digital media from social network platforms maybe continuously or intermittently monitored, and media forclassification may be selected based on a popularity criteria, a qualityof popularity criteria, or the like. For example, the digital media maybe considered popular or trendy in case it is shared a number of timesthat overpasses a popularity threshold, by different users that areconsidered to be authentic users, or the like. In other cases, thepersonalized model may not limit its scope to certain platforms.

In some exemplary embodiments, due to the popularity criteria, onlyviral media may be evaluated and classified by the personalized model.In some exemplary embodiments, media may be evaluated by thepersonalized model only when exceeding the popularity criteria. Forexample, only media that has been shared over a certain number of times,e.g., 10 or any other number, may be evaluated. In another example, thepopularity criteria may utilize one or more metrics provided by socialmedia platforms such as impressions, Click-Through Rate (CTR), or thelike. In another example, the popularity criteria may be based on anyother information relating to a popularity trend of the media. In someexemplary embodiments, media may not be considered viral if shared andviewed by fraud profiles. For example, in case the popularity criteriarequire sharing of content by at least ten users, and ten users werefound to upload and share with each other a thousand media files, thepopularity criteria may not be considered to be fulfilled. Specifically,these profiles may be considered to be fraud and may not be used forcalculating the popularity criteria when determining a number of sharesof the content. In some exemplary embodiments, only authentic profilesmay be used for calculating the popularity criteria. This may reduceeven more a possibility of implementing the coevolution or GANtechniques.

In some exemplary embodiments, user profiles that perform suspiciousaction such as profiles that posted or uploaded one or more fabricatedvideos or other media may be detected and reported. In some exemplaryembodiments, one or more types of analyses may be performed in order todetect a source of shared deepfake media and to detect a first profilethat uploaded each deepfake media. In some cases, suspicious profilesthat are detected using the analyses may be reported as such tocorresponding social media sites. In some cases, social media sites maydeploy in-house techniques to identify fraud profiles, which may befurther utilized for reducing fraud attempts. These in-house techniquesmay be at least partially relied upon for further limiting a possibilityof applying coevolution or GAN techniques.

In order to implement the coevolution or GAN techniques when the scopetis limited to social media platforms with the popularity criteria, theforging entity may be required to create a plurality of fabricated mediafiles and post them on social media sites, e.g., in order to obtainindirect classifications of the personalized model. In some exemplaryembodiments, the forging person may now be required to obtain a largenumber of high quality responses and shares from authenticated profilesin response to each uploaded media, in order to be able to obtain anindirect classifications of the personalized model.

Referring now to FIG. 4 showing a block diagram of an apparatus, inaccordance with some exemplary embodiments of the disclosed subjectmatter.

In some exemplary embodiments, an Apparatus 400 may comprise a Processor402. Processor 402 may be a Central Processing Unit (CPU), amicroprocessor, an electronic circuit, an Integrated Circuit (IC) or thelike. Processor 402 may be utilized to perform computations required byApparatus 400 or any of its subcomponents. Processor 402 may beconfigured to execute computer-programs useful in performing the methodsof FIGS. 2-3 , or the like.

In some exemplary embodiments of the disclosed subject matter, anInput/Output (I/O) Module 405 may be utilized to provide an output toand receive input from a user. I/O Module 405 may be used to transmitand receive information to and from the user or any other apparatus,e.g., a plurality of user devices, in communication therewith.

In some exemplary embodiments, Apparatus 400 may comprise a Memory Unit407. Memory Unit 407 may be a short-term storage device or long-termstorage device. Memory Unit 407 may be a persistent storage or volatilestorage. Memory Unit 407 may be a disk drive, a Flash disk, a RandomAccess Memory (RAM), a memory chip, or the like. In some exemplaryembodiments, Memory Unit 407 may retain program code operative to causeProcessor 402 to perform acts associated with any of the subcomponentsof Apparatus 400. In some exemplary embodiments, Memory Unit 407 mayretain program code operative to cause Processor 402 to perform actsassociated with any of the steps in FIGS. 2-3 , or the like.

In some exemplary embodiments, Memory Unit 407 may comprise one or morePersonalized Models 409. In some exemplary embodiments, each model ofthe Personalized Models 409 may correspond to a communication of a userwith a certain participant, a category of participant, or the like, in acommunication context. In some exemplary embodiments, PersonalizedModels 409 may be configured to authenticate media streams by comparingidentified patterns of a depicted participant with typical patterns ofthe depicted participant as identified by the associated model. In someexemplary embodiments, Personalized Models 409 may be located elsewhere,such as at a server, a cloud network, or the like, in a location that isaccessible by the Apparatus 400. In some exemplary embodiments, eachmodel of the Personalized Models 409 may comprise a supervised machinelearning model that is trained on a corresponding dataset of Datasets411.

In some exemplary embodiments, Memory Unit 407 may comprise Datasets411, which may comprise one or more datasets that may be utilized fortraining Personalized Models 409. Datasets 411 may comprise mediadepicting previous communications of participants with the user. In someexemplary embodiments, each video or audio in Datasets 411 may belabeled as belonging to a first or second class, the first classoriginally depicting the corresponding participant, and the second classoriginally not depicting the corresponding participant. In someexemplary embodiments, within a class, different fabrication anddeepfake techniques may be implemented. In some exemplary embodiments,media belonging to the first class may be tagged as true, while mediabelonging to the second class may be tagged as false, e.g., for trainingpurposes of the supervised machine learning model.

The components detailed below may be implemented as one or more sets ofinterrelated computer instructions, executed for example by Processor402 or by another processor. The components may be arranged as one ormore executable files, dynamic libraries, static libraries, methods,functions, services, or the like, programmed in any programming languageand under any computing environment.

In some exemplary embodiments, Media Obtainer 410 may be configured toobtain a digital media stream from a media source, e.g., a server, adisplayed communication such as on a screen, or the like. Media Obtainer410 may be configured to capture the media stream via I/O Module 405,via a camera capturing a screen, via one or more communicationapplications of a user device associated with Apparatus 400, or via anyother component or device.

In some exemplary embodiments, Context Identifier 420 may be configuredto obtain a captured media stream from Media Obtainer 410, and processthe media stream to identify a context thereof. In some exemplaryembodiments, Context Identifier 420 may be configured to extract or cropfrom the media stream portions that depict one or more participants ofinterest, or types thereof, for which one or more models of PersonalizedModels 409 are trained. In some exemplary embodiments, ContextIdentifier 420 may identify a communication context of the portions,such as the participants of interest themselves, the category thereof, atopic of the interaction, or the like.

In some exemplary embodiments, Model Executer 430 may be configured toobtain media stream portions and contexts thereof from ContextIdentifier 420, and identify whether Personalized Models 409 comprisesassociated models with an associated context. In some exemplaryembodiments, Model Executer 430 may be configured to apply acorresponding model of Personalized Models 409 on each media portion, incase the corresponding model is found for the media portion.

In some exemplary embodiments, Matching Identifier 440 may be configuredto obtain classification results from Model Executer 430, and identifybased thereon whether the media stream matches the associated model ofthe Personalized Models 409. In case a match is found, MatchingIdentifier 440 may determine that the media stream is authentic. In casea mismatch is found, Matching Identifier 440 may determine that themedia stream is fabricated.

In some exemplary embodiments, Apparatus 400 may be implemented within acommunication system, such as a phone communication system, a videocommunication system, or the like, of a user, which may retain for eachparticipant of interest models to be applied during communications withthe participants of interest.

The present invention may be a system, a method, and/or a computerprogram product. The computer program product may include a computerreadable storage medium (or media) having computer readable programinstructions thereon for causing a processor to carry out aspects of thepresent invention.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Smalltalk, C++ or the like, andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some embodiments, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the invention. Asused herein, the singular forms “a”, “an” and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprises”and/or “comprising,” when used in this specification, specify thepresence of stated features, integers, steps, operations, elements,and/or components, but do not preclude the presence or addition of oneor more other features, integers, steps, operations, elements,components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of allmeans or step plus function elements in the claims below are intended toinclude any structure, material, or act for performing the function incombination with other claimed elements as specifically claimed. Thedescription of the present invention has been presented for purposes ofillustration and description, but is not intended to be exhaustive orlimited to the invention in the form disclosed. Many modifications andvariations will be apparent to those of ordinary skill in the artwithout departing from the scope and spirit of the invention. Theembodiment was chosen and described in order to best explain theprinciples of the invention and the practical application, and to enableothers of ordinary skill in the art to understand the invention forvarious embodiments with various modifications as are suited to theparticular use contemplated.

What is claimed is:
 1. A method comprising: obtaining a media streamassociated with a participant, wherein the media stream depicting areal-time communication of the participant in a communication context;identifying the communication context; obtaining a personalized model ofthe participant when communicating in the communication context, whereinthe personalized model is configured to identify a behavioral pattern ofthe participant; executing the personalized model on at least a portionof the media stream to determine whether a behavioral pattern of theparticipant in the media stream matches the behavioral pattern of theparticipant according to the personalized model; and upon identifying amismatch between the behavioral pattern of the participant in the mediastream and the behavioral pattern of the participant according to thepersonalized model, performing a responsive action.
 2. The method ofclaim 1, wherein the responsive action comprises generating an alert orblocking the real-time communication, wherein the alert indicates thatthe media stream is forged.
 3. The method of claim 1, wherein saididentifying the mismatch comprises determining that a difference betweenthe behavioral pattern of the participant in the media stream and thebehavioral pattern of the participant according to the personalizedmodel exceeds a threshold.
 4. The method of claim 1, wherein thepersonalized model comprises a classifier that is trained on a dataset,wherein the dataset comprises media records depicting communications ofthe participant in the communication context.
 5. The method of claim 4,wherein the dataset comprises a first class of media and a second classof media, wherein the first class comprises media records originallydepicting the participant in a communication context, wherein the secondclass comprises media records originally depicting other peopleexcluding the participant in the communication context, the methodcomprising training the personalized model to classify media asbelonging to the first class or to the second class.
 6. The method ofclaim 5 comprising: implementing media fabrication techniques on thefirst class, thereby obtaining processed records of the participant,wherein said media fabrication techniques are configured to replace theparticipant with different people excluding the participant,implementing media fabrication techniques on the second class, therebyobtaining processed records of the other people, wherein said mediafabrication techniques are configured to replace the other people,adding the processed records of the participant to the first class, andadding the processed records of the other people to the second class. 7.The method of claim 6, wherein said implementing the media fabricationtechniques on the second class comprises superimposing the participantover at least some of the other people.
 8. The method of claim 1comprising training a first personalized model of the participant undera first communication context, and training a second personalized modelof the participant under a second communication context.
 9. The methodof claim 1, wherein the communication context is selected from a groupconsisting of: a friendship relationship, a co-working relationship, afamily relationship, a business relationship, a customer-clientrelationship, and a romantic relationship.
 10. The method of claim 1,wherein the communication context comprises a topic of the real-timecommunication.
 11. The method of claim 1 comprising determining anidentity of the participant based on at least one of: a facialrecognition method implemented on the media stream, an audio recognitionmethod implemented on the media stream, metadata of the media stream,and tags relating to the participant that are attached to the mediastream, wherein the communication context comprises the identity of theparticipant.
 12. The method of claim 1, wherein said identifying thecommunication context comprises determining a second participant in thereal-time communication, wherein the media stream depicts the real-timecommunication between the participant and the second participant;wherein the communication context is a context of the participantcommunicating with the second participant; and said obtaining thepersonalized model comprises: obtaining a private model generated basedon past communications between the participant and the secondparticipant, wherein the past communications are not publiclyaccessible.
 13. The method of claim 1, wherein the behavioral pattern ofthe participant comprises at least one of: face movements of theparticipant, face gestures of the participant, a gait of theparticipant, a walking pattern of the participant, hand movements of theparticipant, frequently used phrases of the participant, a talkingmanner of the participant, or a voice pattern of the participant. 14.The method of claim 1 implemented on a communication system used by asecond participant, wherein the communication context is a communicationbetween the participant and the second participant, wherein thecommunication system is configured to retain communications between theparticipant and the second participant and to generate a private modelfor the communication context based on the retained communications. 15.A computer program product comprising a non-transitory computer readablestorage medium retaining program instructions, which programinstructions when read by a processor, cause the processor to: obtain amedia stream associated with a participant, wherein the media streamdepicts a real-time communication of the participant in a communicationcontext; identify the communication context; obtain a personalized modelof the participant when communicating in the communication context,wherein the personalized model is configured to identify a behavioralpattern of the participant; execute the personalized model on at least aportion of the media stream to determine whether a behavioral pattern ofthe participant in the media stream matches the behavioral pattern ofthe participant according to the personalized model; and uponidentifying a mismatch between the behavioral pattern of the participantin the media stream and the behavioral pattern of the participantaccording to the personalized model, perform a responsive action. 16.The computer program product of claim 15, wherein the instructions, whenread by the processor, cause the processor to train a first personalizedmodel of the participant under a first communication context, and totrain a second personalized model of the participant under a secondcommunication context.
 17. The computer program product of claim 15,wherein the communication context is selected from a group consistingof: a friendship relationship, a co-working relationship, a familyrelationship, a business relationship, a customer-client relationship,and a romantic relationship.
 18. The computer program product of claim15, wherein the communication context comprises a topic of the real-timecommunication.
 19. The computer program product of claim 15, wherein theinstructions, when read by the processor, cause the processor todetermine an identity of the participant based on at least one of: afacial recognition method implemented on the media stream, an audiorecognition method implemented on the media stream, metadata of themedia stream, and tags relating to the participant that are attached tothe media stream, wherein the communication context comprises theidentity of the participant.
 20. A system, the system comprising aprocessor and coupled memory, the processor being adapted to: obtain amedia stream associated with a participant, wherein the media streamdepicts a real-time communication of the participant in a communicationcontext; identify the communication context; obtain a personalized modelof the participant when communicating in the communication context,wherein the personalized model is configured to identify a behavioralpattern of the participant; execute the personalized model on at least aportion of the media stream to determine whether a behavioral pattern ofthe participant in the media stream matches the behavioral pattern ofthe participant according to the personalized model; and uponidentifying a mismatch between the behavioral pattern of the participantin the media stream and the behavioral pattern of the participantaccording to the personalized model, perform a responsive action.